Example: White-bearded wildebeest (2010-2013)
We will use a dataset collected during my PhD field research for all future exercises in this course. The dataset consists of 36 adult white-bearded wildebeest (Connochaetes taurinus) fitted with Lotek WildCellTM GPS tracking devices that were tracked across three separate ecosystems (Amboseli Basin, Athi-Kaputiei Plains, and the Greater Maasai Mara) in southern Kenya from 2010-2013. These data are freely available on Movebank. Known to be particularly important to ecosystem function and diversity, many resident populations of wildebeest have become threatened with extinction over the past few decades. The goal of this research was to understand how natural and anthropogenic change were affecting the movements of wildebeest fitted with tracking devices.
These data can be referenced as:
Stabach JA, Hughey LF, Crego RD, Fleming CH, Hopcraft JGC, Leimgruber P, Morrison TA, Ogutu JO, Reid RS, Worden JS, Boone RB. 2022. Increasing anthropogenic disturbance restricts wildebeest movement across East African grazing systems. Frontiers in Ecology and Evolution. doi.org/10.3389/fevo.2022.846171
Stabach JA, Hughey LF, Reid RS, Worden JS, Leimgruber P, Boone RB. 2020. Data from: Comparison of movement strategies of three populations of white-bearded wildebeest. Movebank Data Repository. doi:10.5441/001/1.h0t27719
In this excercise and those that follow, you will:
moveHMMIn previous exercises, we analyzed data as a continuous-space, continuous-time stochastic process. Here, we will analyze data in discrete time to identify changes in animal behavior.
As a first step, we must first clean the data received before taking
any additional steps. This includes filtering the dataset for
completeness, identifying duplicate records, removing invalid start or
stop dates, identifying positions of poor data quality, and ordering the
dataset sequentially so that we can import the data into
moveHMM for analyses. Nearly all manufacturers report some
type of positional quality, often reported as the type of position
(e.g., 1D, 2D, or 3D) or Dilution of Position (DOP - horizontal or
vertical). Large DOPs indicate poor positional quality that can be
filtered from the dataset.
Load the required libraries and remove everything held in R’s memory.
# Remove from memory
rm(list=ls())
# You may need to install these packages first
#install.packages('svDialogs', 'tidyverse', 'move2', 'lubridate')
# Load required libraries
library(svDialogs)
library(tidyverse)
library(move)
library(lubridate)
library(tmap)
#library(sf)
#library(amt)
#library(adehabitatLT)
#library(gt)
#library(cowplot)
#library(suncalc)
I prefer to keep items that require user input to appear at the top of my scripts. For me, this is helpful when transitioning between different studies because most of the introductory code is essentially the same, except for the time zone and local projection used. I recommend importing your data in UTC time with geographic (Lat/Long) coordinates and then converting these values to your local time zone and coordinate system.
In this example, we will update UTC (Coordinated Universal Time) to East Africa Time (EAT) since our data were collected in Kenya. See the wiki for your appropriate timezone identifier. We will also project all geographic data to Universe Tranverse Mercator (UTM) zone 37 south. The unit of measurement of this coordinate system is meters, advantageous to calculate understandable distances between points.
Identifying the correct UTM zones is relatively simple, with multiple options:
NOTE: The data we are using spans multiple UTM zones (UTM 36 south and UTM 37 south). In this case, it isn’t a problem because we will be subsetting our data and only using data that overlap with UTM 37 south. Other coordinator systems, such as Lambert or Albers Equal-Area conic projections, could be used for datasets that span multiple UTM zones, minimizing distortion. It’s important to correctly input the parameters of your selected coordinate system.
# Set TimeZone and UTM Zone
# Other timezones can be found at: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
Timezone1 <- 'UTC'
Timezone2 <- "Africa/Nairobi"
# UTM Zone
LatLong.proj <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs" # EPSG:4326
UTMZone.proj <- "+proj=utm +zone=37 +south +ellps=WGS84 +datum=WGS84 +units=m +no_defs" #+inits=EPSG:32737"
#UTMZone.proj <- "+inits=EPSG:32737"
All data are available via Movebank. These data have already
been downloaded and are available in your /Data folder as a
.csv. Included is a reference file (also downloaded from Movebank), which contains
additional details (sex, age) about each animal.
Most important is that any movement dataset included in analysis has the following variables:
NOTE: If you open the file provided in Excel, do NOT save the file before loading it in R. Excel will change the time stamp column and set it to the time on your computer - obviously not what you want for most studies.
# We can upload each file directly and convert to a dataframe
#WB <- as.data.frame(read_csv("./Data/White-bearded wildebeest (Connochaetes taurinus) movements - Kenya.csv"))
# Or list the files so we don't have to type so much
# Create list
lf <- list.files(path="./Data/", pattern = '.csv', all.files=FALSE, full.names=TRUE)
lf
## [1] "./Data/White-bearded wildebeest (Connochaetes taurinus) movements - Kenya-reference-data.csv"
## [2] "./Data/White-bearded wildebeest (Connochaetes taurinus) movements - Kenya.csv"
WB <- as.data.frame(read_csv(lf[2]))
WB.ref <- as.data.frame(read_csv(lf[1]))
# Look at the data
# head(WB)
# head(WB.ref)
# Fix a few column name issues
names(WB) <- gsub("-", "_", names(WB))
names(WB) <- gsub(":", "_", names(WB))
names(WB.ref) <- gsub("-", "_", names(WB.ref))
# Check to identify the changes
# head(WB)
# head(WB.ref)
# We could also pull the file directory from Movebank. To do so, you will need a Movebank UserName and Password.
# This is particularly useful when data continue to be collected (i.e., are actively streaming) on a study.
# Note: When pulling from Movebank, the data will contain a few more data fields than the uploaded CSV
# Set Movebank Login Details
# UN <- dlgInput("Enter Movebank UserName: ", Sys.info()[""])$res
# PW <- dlgInput("Enter Movebank Password: ", Sys.info()[""])$res
# Details
# login <- movebankLogin(username=UN, password=PW)
# Pull Data from Movebank and convert to a dataframe
# WB <- as.data.frame(getMovebankData(study = "White-bearded wildebeest (Connochaetes taurinus) movements - Kenya", login = login))
# Reference Data - No need to import here, as data are already subset in Movebank
# WB.ref <- getMovebankReferenceTable(study = "White-bearded wildebeest (Connochaetes taurinus) movements - Kenya", login = login)
It’s always good practice to look at your data and make sure the dataset has uploaded correctly. We should have 0 animals across 3 distinct ecosystems. The timezone should be UTC. We can also use some simple commands to summarize the dataset, including:
head() to view the first few lines of the data
objecttail() to view the last few lines of the data
objectdim() to print the dimensions of the data object (i.e.,
rows and columns)str() to provide the structure of the data objectQuestion 1: How would you change the code to view the last few lines of the dataset? What would you do to view the first 10 rows?
head(WB)
Question 2: What tags are included in the study? How many?
sort(unique(WB$tag_local_identifier))
length(unique(WB$tag_local_identifier))
Question 3: How many study areas are included in the dataset? What are the study area names?
length(unique(WB.ref$study_site))
unique(WB.ref$study_site)
Question 4: What is the structure of the dataset? What is the data type of the timestamp column?
str(WB)
str(WB.ref)
Question 5: How many rows and columns are there in the movement dataset?
dim(WB)
nrow(WB)
ncol(WB)
Question 6: What is the timezone of the dataset?
tz(WB$timestamp)
Many of the columns included in the dataframes are unnecessary (and quite long). We need to clean and filter each dataframe and can do so with tools from the tidyverse package. Here, we will update the timezone to the local time zone and subset the start/end dates for each animal to match with what we recorded in our reference dataset. These dates represent the dates that the collar was deployed on the animal. To do so we need to join the two dataframes together using a shared primary key (i.e., id). Lastly, we will check on the completeness of the dataset and filter out erroneous data points.
NOTE: date/time fields can often present issues and
be difficult to import. If problems exist, you may need to properly
format the date/time fields first. See the lubridate)
package for some instructions and/or the strptime
package to format your date/time field. Note the format of
YOUR own dataset (e.g., “%Y-%m-%d %H:%M:%S”). Also note
that some manufacturers put the date and time in separate columns.
You’ll need to put these columns together to create a timestamp. For
example,
timestamp = ymd_hms(paste(date,time),tz = 'Africa/Nairobi')
# Clean the reference file, selecting only the columns that you want to include
# Important is that the file contains the deployment on and off date and times. These dates will allow us to filter/remove datapoints outside of the deployment window
# Using the pipe (%>%) operator here to join a series of commands together
WB <- WB %>%
# Pull in the WB.ref dataset to join the study site column
left_join(WB.ref, by = join_by("tag_local_identifier" == "tag_id")) %>%
# Rename columns and create local timestamp - UTC to local
# Select only columns of interest
transmute(id = tag_local_identifier,
animal_id = individual_local_identifier,
latitude = location_lat,
longitude = location_long,
sex = animal_sex,
DOP = gps_dop,
fixType = gps_fix_type_raw,
temp = external_temperature,
timestamp = with_tz(timestamp, tz=Timezone2),
deploy_on = with_tz(deploy_on_date, tz=Timezone2),
deploy_off = with_tz(deploy_off_date, tz=Timezone2),
study_site = study_site) %>%
# Correct the data type for specific fields
mutate(across(c(id,sex,study_site), as.factor)) %>%
# Make sure no duplicate id and timestamp exist.
distinct(animal_id, timestamp, .keep_all = TRUE) %>%
# Remove any records that don't have a timestamp or a Lat/Long location
filter(!is.na(timestamp),
!is.na(latitude),
!is.na(longitude),
latitude != 0,
longitude != 0,
# Grab only the Athi-Kaputiei Data
study_site == "Athi-Kaputiei Plains",
# And use the deploy on and off dates to further subset
# (Won't do anything here, as dates already subset)
timestamp >= deploy_on & timestamp <= deploy_off) %>%
# Remove fields that are now unnecessary. Dropping the deployment dates and the study area
dplyr::select(-c(deploy_on, deploy_off, study_site)) %>%
# Remove extra levels (important since subsetting to a single study area)
# The droplevels function re-assess what levels are in the data and drops the rest
droplevels() %>%
# Arrange the dataset by id and timestamp
arrange(id, timestamp)
# Look again (yes again!) at your data
head(WB)
## id animal_id latitude longitude sex DOP fixType temp timestamp
## 1 2840 Sotua -1.358896 36.84575 m 2.6 3 29 2010-10-15 16:00:00
## 2 2840 Sotua -1.360489 36.84604 m 2.2 3 28 2010-10-15 17:00:00
## 3 2840 Sotua -1.360607 36.84662 m 2.0 3 25 2010-10-15 18:00:00
## 4 2840 Sotua -1.361414 36.84697 m 2.2 3 21 2010-10-15 21:00:00
## 5 2840 Sotua -1.362291 36.84757 m 3.0 3 18 2010-10-16 00:00:00
## 6 2840 Sotua -1.361920 36.84637 m 1.2 3 18 2010-10-16 03:00:00
str(WB)
## 'data.frame': 99710 obs. of 9 variables:
## $ id : Factor w/ 12 levels "2840","2842",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ animal_id: chr "Sotua" "Sotua" "Sotua" "Sotua" ...
## $ latitude : num -1.36 -1.36 -1.36 -1.36 -1.36 ...
## $ longitude: num 36.8 36.8 36.8 36.8 36.8 ...
## $ sex : Factor w/ 2 levels "f","m": 2 2 2 2 2 2 2 2 2 2 ...
## $ DOP : num 2.6 2.2 2 2.2 3 1.2 1.8 3.4 4.2 3.6 ...
## $ fixType : num 3 3 3 3 3 3 3 3 3 3 ...
## $ temp : num 29 28 25 21 18 18 17 18 22 23 ...
## $ timestamp: POSIXct, format: "2010-10-15 16:00:00" "2010-10-15 17:00:00" ...
# What tags are included in the study now? How many?
sort(unique(WB$id))
## [1] 2840 2842 30068 30070 30071 30072 30074 30077 30079 30082 30084 30086
## 12 Levels: 2840 2842 30068 30070 30071 30072 30074 30077 30079 30082 ... 30086
length(unique(WB$id))
## [1] 12
Text here
# # Create sf object
# WB.sf <- WB %>%
# st_as_sf(coords = c('longitude', 'latitude'),
# crs = LatLong.proj) %>%
# st_transform(UTMZone.proj)
#
# # Plot the data
# plot(WB.sf["animal_id"],
# main = paste0("Wildebeest: Athi-Kaputiei Plains (n = ", length(unique(WB.sf$id)),")"))
#
# # We can do better than that, but at least this confirms what I'd expect
# WB.sf %>%
# ggplot() +
# geom_sf(aes(fill = animal_id),
# alpha = 0.6,
# shape = 21,
# col = "black") +
# theme_bw()
#
# # or separate each individual into its own plot
# WB.sf %>%
# ggplot() +
# geom_sf(aes(fill = animal_id),
# alpha = 0.6,
# shape = 21,
# col = "black") +
# theme_minimal() +
# facet_wrap(~ animal_id)
# This is perhaps the most helpful view for getting a sense of the data collected for each individual, how much their space use varies, and whether there are errant points that may represent actual GPS error. Here fortunately there are no points that can be identified visually as significant outliers. Be sure to use the "zoom" button above the plot to see it in larger size.
# Let's do the same thing using tmap, overlaying the data on a satellite map to investigate potential problems
# tmap camps can be made in "view" or "interactive" mode, or "plot" mode. Plot mode is better for making stating maps
# tmap_mode("view")
#
# # Here we can select a range of basemaps. And you can find a list of available maps HERE.
# # https://leaflet-extras.github.io/leaflet-providers/preview/
#
# # You can also type "providers$" in the console to see all the options, but this won't allow you to easily preview them.
#
# # Here I'm pulling a world satellite map, and a world street map.
# # Note also that in tmap we need to refer to our variables in quotes.
#
# tm_basemap(c("Esri.WorldImagery",
# "OpenStreetMap.HOT")) +
# tm_shape(WB.sf,
# name = "Wildebeest Locations (Athi-Kaputiei)") +
# tm_dots(size = 0.05,
# title = "Wildebeest ID",
# col = "animal_id",
# alpha = 0.5)
# With this map, you can zoom in and out to investigate points from a particular animal. You can also toggle on and off the various basemap tiles you have selected using the tile icon in the top left corner. If you plotted other data as well, you can also toggle these layers on and off.
# Spend some time zooming into the map. You can see a large area in the central north of the study region with no locations. Once you zoom in you can see that this area is developed and that the wildebeest do not venture into this part of the landscape. There very well may be fencing excluding the herds from these areas as well. In the extreme north of the study area is the Kenyan capital Nairobi. This is more obvious when clicking on the street map basemap.
# Again here though, I do not see anything visually that would signal large positional errors. Let's dig into error a bit more in the next section.